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Abstract. In this paper, we introduce the following problem in the the- 
ory of algorithmic self-assembly: given an input shape as the seed of a 
tile-based self-assembly system, design a finite tile set that can, in some 
sense, uniquely identify whether or not the given input shape-drawn from 
a very general class of shapes-matches a particular target shape. We first 
study the complexity of correctly identifying squares. Then we investi- 
gate the complexity associated with the identification of a considerably 
more general class of non-square, hole-free shapes. 

1 Introduction 

As amazingly complex as biological organisms are, at the nanoscale they are com- 
posed of "simple" pieces that spontaneously self-assemble-a bottom-up process 
by which a relatively small group of fundamental components combine according 
to local rules in order to form a complex structure. This very basic process is 
responsible for the vast diversity and complexity of life-from the most simple 
single-cell organisms to human beings. 

Inspired by nature, scientists have developed and studied a wide variety of 
artificial self-assembling systems in order to produce structures as varied as 
nanowires [33], crystals [15], nanofiber scaffoldings [14], landscapes for nanoscale 
robots [13,19] and dozens of other novel supramolecules (see [6,22,27] for more 
examples). In addition to experimental work, there has also been a plethora of 
theoretical work in the design and analysis of the complexities and limitations 
of self- assembling systems, with notable examples including [8,12,20,23]. 

Much of the research in algorithmic self-assembly (both theoretical and ex- 
perimental) can be loosely categorized into four "genres:" the self-assembly 
of shapes [24,26], evaluating computable functions to direct nanoscale self- 
assembly [17,32], replicating input shapes [1], and creating novel materials that 
have various chemical properties [34]. In this paper, we introduce a novel (the- 
oretical) self-assembly problem that is motivated by not only the behavior of 
biological systems but also the practical need to verify artificial laboratory-based 
self-assembly systems. We call this new problem the shape identification problem, 
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and define it as the task of designing a tile-based self-assembly system that pos- 
itively identifies a target structure that has a pre-specified shape (and size) from 
among possible "junk" structures drawn from a very general pool of objects. 

Motivation: Shape identification is a fundamental process of nature and is 
explicitly used by biological systems in a variety of ways. First and foremost, 
the immune system generates complexes whose express purpose is to selectively 
identify-and ultimately bind to-precisely-shaped locations on the surface of for- 
eign objects in order to mark them for destruction (by, for example, killer T cells). 
Also, cellular transport systems, such as those which transport amino acids or 
sugars, work by moving specifically-shaped molecules from one side of a mem- 
brane to the other. Furthermore, the power of a self-assembling system (natural 
or artificial) ultimately arises from the information encoded in its constituent 
components. In the notable case of proteins, it is the information embedded in 
their precise three-dimensional geometry that allow them to match and combine 
with the necessary specificity to build the fundamental building blocks of life. 

The ability to correctly identify only the completely formed products of an 
artificial self-assembly system is also of extreme importance to practitioners. 
Unfortunately, accomplishing this task is difficult because the self-assembly en- 
vironment is often variable and chaotic, where mistakes are likely to be made and 
partially-formed products common. Current methods of imaging the results of 
nanoscale self-assembling systems provide insufficient resolution for automated 
visual inspection of assemblies and require error-prone manual inspection (for 
instance, by pouring over atomic force microscope images). Methods such as 
gel electrophoresis allow for the separation of products based loosely on their 
mass and shape, but unfortunately with far less shape specificity than desired. 
With accurate nanoscale shape identification schemes, however, the accuracy of 
the techniques that experimenters use to identify the products of self-assembling 
systems could be improved dramatically. 

In this paper, we formulate the shape identification problem in algorithmic 
self-assembly (defined formally in Section 2.1) and exhibit a variety of solu- 
tions thereof while working in the RNAse enzyme model-a discrete mathematical 
model of two-handed tile-based self-assembly (based on Winfree's abstract Tile 
Assembly Model [23,29]) that distinguishes DNA tiles from RNA tiles and per- 
mits the usage of an RNAse enzyme that dissolves all of the RNA tiles in a given 
assembly. This model was initially suggested by Rothemund and Winfree in the 
final section of [23] and formally defined by Abel, Bcnbernou, Damian, Demaine, 
Dcmaine, Flatland, Kominers and Schweller [1]. We focus our attention on the 
design of "small" tile sets that identify certain types of target shapes by tagging 
them with a border of DNA tiles. Note that the borders, which signify positive 
identification, could also be "functionalizcd" with bindings sites that facilitate 
the easy extraction of only the correct assemblies. It is worthy of note that, 
while the results presented in this paper are based on tile-based self-assembly 
systems identifying tile-based assemblies, the underlying principles of this paper 
are applicable to the identification of any type of precisely shaped shaped ob- 
ject (e.g., a DNA origami complex [22]) so long as its perimeter advertises the 



necessary binding domains, which in the case of this paper, are single-stranded 
DNA sequences. 

Statement of Results: In Section 3, we exhibit a planar tile assembly 
system (a.k.a., a system in which all supertiles have obstacle- free paths to their 
mates and therefore require the use of only two spacial dimensions; see [9] for 
more additional examples of planar self-assembly systems) capable of identifying 
any n x n square using O(logn) unique tile types. We then use a well-known 
optimal encoding scheme [5, 26] to reduce the number of unique tile types in 

the aforementioned construction (while preserving planarity) to O ( Iq 1 ° 1 q"„ ) • 



We subsequently prove a matching lower bound on the minimum number of 
unique tile types required to identify an n x n square. This implies that the 
complexity of identifying an n x n square in a restricted RNAse enzyme model 
coincides exactly with that of its self-assembly in the abstract Tile Assembly 
Model [5]. We conclude Section 3 with a 0(1) size planar tile assembly system 
that "universally" identifies whether or not any hole-free input shape is an n x n 
square. In Section 4, we develop a non-planar tile assembly system that identifies 
a wide variety of "hole-free" shapes that have a kind of "perimeter-rectangle 
decomposition" that uses an optimal number of unique tile types in the sense 
of Kolmogorov complexity. We then mildly extend the aforementioned result to 
identify a more general class of shapes-assuming the use of two different types 
of RNAse enzymes is permitted. 

2 Preliminaries and Notation 

Please see Section 7.1 for a brief description of the two-handed Tile Assembly 
Model and Section 7.2 for a description of the RNAse enzyme extension to it. 

2.1 Formulation of the Shape Identification Problem in the RNAse 
Enzyme Model 

Fix a temperature r <G N. For every shape (a.k.a., a finite, connected subset of 
Z 2 ) X, define the assembly ax as the placement of specially designated seed 
(DNA) tiles at every point in X subject to the restrictions that ax must be Te- 
stable, the strengths of all of the "external" glues must be 1 and there should be 
no way to determine "corner tiles" of ax (this latter restriction is accomplished 
by assuming that all external glues of ax are labeled with the empty string) . 1 
We are now ready to define the shape identification problem in self-assembly. 
Fix some class of shapes C along with a target shape X € C. The goal is to design 
a finite set of tile types Tx (that does not contain any of the tile types that appear 
in the seed assembly ax) satisfying the following condition: given an input shape 
Y 6 C encoded as the seed assembly cry, if X = Y then the system (Tx, &y,t) 

1 We speculate that one possible molecular implementation of this might be achieved 
using Rothemund's DNA origami as a seed structure [7, 22] to which DNA and RNA 
tiles can subsequently attach. 




uniquely produces a fully-connected final assembly a consisting of ay with a 
fully connected ring of "border" tiles along the border of ay; if X ^ Y, then ay 
is the uniquely produced terminal structure. If it is possible to accomplish such 
a task (i.e., design such a tile set Tx), then we say that Tx identifies the shape 
X with respect to C. Note that since we must add the RNAse enzyme last, the 
border tiles must necessarily be DNA tiles so that they are not simply dissolved 
away at the end. We say that a shape X can be identified with respect to C if 
there exists a finite set of tile types Tx that can identify it with respect to C. 




(a) Target shape X (b) Seed assembly ay (c) The goal! 

and input shape Y 

Fig. 1: The desired outcome for a "yes" instance of the shape identification problem. 



An example of an instance of the shape identification problem (for some 
shape with respect to some class of shapes) is depicted in Figure 1. We say that 
the identification complexity of X £ C with respect to C is the minimum number 
of tile types necessary to identify it with respect to C (this is analogous to the 
tile complexity of a shape X being defined as the minimum number of tile types 
necessary to uniquely produce X). 

3 Identification ofnxn Squares with Planarity 

In this section, we exhibit two planar self-assembly systems (a.k.a., systems in 
which all supertiles have obstacle-free paths to their mates and therefore require 
use of only two spacial dimensions; see [9] for more discussion of planarity) 
that efficiently identify n x n squares with respect to the set of all hole-free 
shapes: shapes whose complements are infinite, connected subsets of 1? . We 
also construct a universal tile set that is capable of identifying whether a given 
input shape is in fact a square of any dimension. 

For each n £ N, let S n — {0,1, ... , n— l} 2 be the nxn square whose lower-left 
corner is positioned at the origin. Throughout this paper, C denotes the class of 
all hole-free shapes. The motivating factor behind defining C this way is because 
we want our constructions to be able to distinguish a target shape from among 
many different possible "junk" (i.e., non-square) input shapes. The temperature 
for all of our constructions in this paper is r = 4. 



3.1 Planar Identification ofnxn Squares with O(logn) Unique 
Tile Types 

Our first main result of this section is the following theorem, which states that 
there is an efficient planar identification scheme for n x n squares. 

Theorem 1. For all 6 < n sN, the identification complexity of S n with respect 
to C is O(logn). 

The proof idea of Theorem 1 is as follows. Suppose we are trying to identify 
S n for some 6 < n G N. Given an input shape Y € C, our construction first 
attaches "verification modules" to north- south- and west-facing sides of Y (if Y 
is an n x n square, then there will be exactly one of each of these types of sides). 
These modules are side-by-side pairs of unary counters and binary counters that 
do not interact with each other as they count. The unary counters count (in 
unary) the length of the side to which they are attached and the binary counters 
essentially count (in binary) up to n (the dimension of the target square). Each 
verification module compares the length of the side to which it is attached with 
n. If all three verification modules report success and agree with each other, 
then the input shape is in some sense "almost" a square. The three verification 
modules then cooperate to allow DNA border tiles to start attaching to the 
east-facing side of the input shape. If border tiles can attach to all but the 
two bottom rightmost points along the east-facing side, then the input shape 
is in fact S n and our construction reaches an intermediate terminal state. At 
this point, we add the RNAse enzyme leaving only the input shape to which 
the east-facing border tiles are attached. The remaining border tiles attach in 
a clockwise fashion until a complete and fully connected border is assembled. 
However, if not all east- facing border tiles can attach (i.e., the input shape is 
not S n ), then after the RNAse enzyme is added all previously-attached border 
tiles will disassociate one at a time until no tiles are attached to the input shape. 

Please see Section 7.3 for a more detailed explanation of this construction. 



3.2 fi y lo ^,g ~ j Unique Tile Types are Necessary to Identify an 
n X n Square 

In [231 , Rothemund and Winfrce established an fl ( , lo , s " ) lower bound on the 
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number of tile types required to uniquely assemble annxn full square for almost 
all n. In this section, we adapt their information-theoretic proof technique to the 
shape identification problem for n x n squares under the RNAse enzyme model. 

Theorem 2. For all but finitely many n G N, the identification complexity of 
S n with respect to C is Q \ . lo f" ) . 

n r ou oer n. I 



Proof (sketch). For each n G N, define the Kolmogorov complexity of n as 
K(n) = min{|7r| | U(tt) = n} where U is some fixed universal Turing ma- 
chine. The reader is encouraged to consult [28] for a more detailed discussion of 



Fig. 2: An example of our construction for Theorem 1 with n = 7. Our tile set is 
partitioned into several logical groups-each given a different color in this figure 
to represent the relative order in which they assemble (i.e., Red, Orange, Yellow, 
Green, Blue, Indigo, Violet). First, the red supertiles assemble and attach to the 
corners of the input shape. The orange group essentially encodes the length of 
the to-be-identified square via a binary counter and requires 0(log n) unique tile 
types. The "U" border tiles attach along the east-facing side of the input shape. 
All tiles are RNA tiles except for the "U" tiles and, of course, the tiles that make 
up the initial seed square. 



Kolmogorov complexity. An easy application of the pigeonhole principle tells us 
that for almost all n£fj, K{n) = J?(logn). 

Note that for a given n £ N and temperature r £ N, there exists a constant 
size Turing machine M that takes as input a tile set T n that uniquely identifies 
S n , a seed assembly representing the input shape a (as discussed in Section 2.1) 
and outputs the maximum extent (height or width) of the uniquely produced 
terminal assembly. We can then use M as a subroutine in another constant size 
Turing machine N that takes as input T n and sequentially simulates M on T n 
with the seed assembly o"s 4 for i > in order while checking if the maximum 
extent (height or width) of the i th uniquely produced terminal assembly is i + 2. 
Since T n uniquely identifies S n , we are guaranteed that this search will eventually 
terminate, at which point N halts and outputs i = n. This implies that the size of 
(number of bits in) the encoding for T n must be J? (log n). Since we can encode an 
arbitrary tile set T with 0(\T\ log \T\) bits (assuming T has a diagonal strength 



Tile Types 

The construction for Theorem 1 can be modified to prove the following asymp- 
totically optimal result for the identification ofnxn squares. 

Theorem 3. For all 6 < n £ N, the identification complexity of S n with respect 



Consult Section 7.4 for more details. 

3.4 Universal Planar Identification of Squares with O(l) Unique 
Tile Types 

In the previous subsections, we focused our attention on the problem of iden- 
tifying n x n squares for particular values of n from among any input shape 
drawn from the set of all hole-free shapes. We now study the related problem 
of universally identifying whether or not a given input shape is an n x n square 
for some n £ N. Here, we are given an arbitrary hole-free input shape and we 
wish to correctly identify it (in the sense of tagging its border with special tiles) 
if and only if it is in fact a square. 

Theorem 4. There exists a planar (universal) tile set T with \T\ = 0(1) such 
that, for all 6 < n G N, T identifies S n with respect to C. 

Intuitively, we prove Theorem 4 by constructing a constant size tile set that 
(1) grows unary counters off of the north, west and south sides of the input 
shape and then (2) allows a border of DNA tiles to assemble if and only if all of 
the counters agree on the same value (in addition to the right side of the input 
shape being consistent with that of a square). Please see Section 7.5 for a more 
detailed discussion. 

Note that planarity isn't necessary for any results presented in this section. 




3.3 
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4 Non-Planar Identification of More Shapes 



We now exhibit a non-planar self-assembly system that efficiently identifies a 
wide variety of shapes with respect to the set of all hole-free shapes but at the 
expense of sacrificing planarity. We first define some notation. 



(a) An example shape X (b) A valid perimeter- 

rectangle decomposition of 
X. 

Fig. 3: An example of a perimeter-rectangle decomposition of a particular shape. 

Let (x,y) = a e Z 2 and (w,z) = b e Z 2 and define e?oo (a,b) — max{|x — 
w\, \y — 2 1}. If X is a shape, then we say that the feature size of X is the 
minimum doo (a, b) such that a and b are on two non-adjacent edges of X. 
We say that a shape X is x-monotone if its inte£section with any vertical line 
is a connected line. If AT is a shape, then let R(X) be the smallest rectan- 
gle that contains X + {(0, 3), (0, — 3)}, where, for any set A C Z 2 , X + A = 
{x + a \ x e X and a G A}. We say that X has a perimeter- rectangle decom- 
position, denoted as {i?;}"^ 1 for some n £ N, if: for each < i < ra, Ri is a 
rectangle, for all < j < n, i ^ j =*> i? t n i?j = 0, height(RA < 2 wldth ^ + 3, 
R(X) — X = UfcTo 1 anc ^ ^ or eacn < i < n, the perimeter of Ri intersects 
the perimeter of R(X). For any rectangle R, we write h(R) = height(R), and 
w(R) — width(R). See Figure 3 for an example of a shape and a valid perimeter- 
rectangle decomposition thereof. Recall that C is the set of all hole-free shapes. 

Theorem 5. Fix a universal Turing machine U . Let X be a shape and ttx be 
any program such that U(nx) = (X) , where (•) is a standard binary encoding of 
a finite object. If X is x-monotone, has feature size 5 and has perimeter-rectangle 
decomposition {Ri}™=Q , then the identification complexity of X with respect to 

C isO( . 

Note that by choosing nx to be the shortest program such that U (ttx) = (X), 
then | ttx | = K(X). The proof idea of Theorem 5 is as follows. Given a shape X 
that satisfies the hypothesis, our construction first converts X into a string xyz 



such that x encodes all of the "north- facing" features of X, y encodes h yR(X)j 
and z encodes all of the "south- facing" features of X. Our construction then 
uses this string as a seed in order to assemble a frame to which an input shape 
can attach. Once the input shape attaches to the frame, a single-tile-wide border 
assembles around the perimeter of the input shape and fills in completely if and 
only if the input shape matches the target shape. Once the RNAse enzyme is 
added, the frame dissolves and if the input shape has a full border of DNA tiles, 
then we are done and the target shape has been correctly identified. However, if 
the input shape does not match the target shape, then our construction ensures 
that a full border around the input shape is not allowed to assemble. Moreover, 
if the border is not fully formed after the RNAse enzyme is added, then the 
partially formed border will disassemble in a counter-clockwise fashion one tile 
at a time eventually leaving the input shape completely free of DNA border tiles. 
We encode A as a program ttx using the optimal encoding scheme of Soloveichik 
and Winfree [26] , whence the identification complexity of X with respect to C is 
O ( tii' )• Please see Section 7.6 for more details of this construction. 

5 Non-Planar Identification of Even More Shapes 
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(a) Very high-level overview of the construc- 
tion for Theorem 6. The grey wedge represents 
a self-assembly simulation of a Turing machine 
that unpacks a compact description of all of 
the rectangles that eventually assemble into a 
frame that accepts the input shape. 



(b) After the first type 
of RNAse is added, all 
of the supertiles are 
free to assemble into a 
frame that accepts the 
input shape. 



Fig. 4: Overview of the construction for Theorem 6. 



Throughout this paper, we have assumed that the RNAse enzyme (the agent 
responsible for removing all of the RNA tile types) is added once-and only-after 
the initial stage of two-handed self-assembly is allowed to reach a terminal state. 
Under this assumption, the RNAse enzyme universally dissolves every RNA tile 



in all of the produced assemblies. In this section, we relax this restriction and 
allow for the use of two different types of RNAse enzymes in two separate dissolve 
stages that each affect a different group of RNA tiles. Doing so leads to a mild 
refinement of Theorem 4, stated precisely as the following theorem. 

Theorem 6. Fix a universal Turing machine U and let X be a shape and irx 
be any program such that U(irx) = (X). If X is x-monotone, has feature size 6 
and if the use of two different types of RNAse enzymes in two separate dissolve 
stages is permitted, then the identification complexity of X with respect to C is 



The proof idea of Theorem 6 is similar to that of Theorem 4 in that we 
assemble a frame that accepts an input shape Y and allows a border of DNA 
tiles to assemble if and only if the Y = X. In order to overcome the assumption 
that the Y must have a perimeter-rectangle decomposition, we use two dissolve 
stages. Please see Section 7.7 for more details of this construction. 

6 Open Questions 

There are a number of open problems related to using self-assembly for shape 
identification. First and foremost, a drawback to all of our constructions is that 
they utilize a system temperature of t = 4. Although solving the shape iden- 
tification problem at temperature r = 2 is impossible by the way the problem 
is currently formulated, it would be nice to know if there is a solution with 
temperature r = 3. Regardless of the system temperature, is it possible to effi- 
ciently identify arbitrary hole-free shapes with respect to the set of all hole-free 
shapes (this is perhaps one of the strongest possible shape-identification results 
one could hope for)? Another interesting research direction might be to study 
the complexities of identifying various classes of shapes in other models of tile- 
based self-assembly that allow for the removal of groups of tiles (e.g., kinetic tile 
assembly model [30], multiple temperature model [8,16], negative glue-strength 
model [11,21,29], time-dependent glue-strength model [25], etc.). 
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7 Appendix 

7.1 Informal Description of the Two-Handed Abstract Tile 
Assembly Model 

In this subsection we informally review a variant of Erik Winfree's abstract 
Tile Assembly Model [29, 30] modified to model unseeded growth, known as 
the two-handed aTAM, which has been studied previously under various names 
[1,2,4,8,9, 18,31]. In the two-handed aTAM, any two assemblies can attach to 
each other, rather than enforcing that tiles can only accrete one at a time to an 
existing seed assembly 

A tile type is a unit square with four sides, each having a glue consisting of 
a label (a finite string) and strength (a natural number). We represent tiles as 
squares. Notches on the sides of tile types represent the glue strength of that 
side. The thick notches represent strength 4, and otherwise each single notch 
contributes a strength of one to the glue strength of that side. See Figure 5 for 
an example of our tile notation. 



Fig. 5: The north glue of this tile has strength 4, the west glue has strength 1, 
the east glue has strength 3 and the south glue has strength 0. 



We assume a finite set T of tile types, but an infinite number of copies of 
each tile type, each copy referred to as a tile. A supertile (a.k.a., assembly) is 
a positioning of tiles on the integer lattice Z 2 . Two adjacent tiles in a supertile 
interact if the glues on their abutting sides are equal and have positive strength. 
Each supertile induces a binding graph, a grid graph whose vertices are tiles, 
with an edge between two tiles if they interact. The supertile is r-stable if every 
cut of its binding graph has strength at least r, where the weight of an edge is 
the strength of the glue it represents. That is, the supertile is stable if at least 
energy r is required to separate the supertile into two parts. A tile assembly 
system (TAS) is a pair T = (T, a, r), where T is a finite tile set, a is an initial 
seed configuration and t 6 N is the temperature. Given a TAS T — (T, a, r), a 
supertile is producible if either it is a single tile from T, a, or it is the r-stable 
result of translating two producible assemblies. A supertile a is terminal if for 
every producible supertile (3, a and /3 cannot be r-stably attached. A TAS is 
directed (a.k.a. deterministic or confluent) if it has only one terminal, producible 
supertile and all producible assemblies are finite. Given a connected shape X C 
Z 2 , a TAS T produces X uniquely if every producible, terminal supertile places 
tiles only on positions in X (appropriately translated if necessary). 




7.2 RNA Tiles and the RNAse Enzyme 



In this paper, we assume that each tile type is defined as being either composed of 
DNA or of RNA. By careful selection of the actual nucleotides used to create the 
glues, tile types of any combination of compositions can bind together. However, 
the utility of RNA-based tile types comes from that fact that, at prescribed 
points during the assembly process, the experimenter can add an RNAse enzyme 
to the solution which causes all tiles composed of RNA to dissolve. We assume 
that when this occurs, all portions of all RNA tiles are completely dissolved, 
including glue portions that may be bound to DNA tiles, returning the previously 
bound edges of those DNA tiles to unbound states. 

In other words, for a given supertile a that is stable at temperature r, when 
the RNAse enzyme is added to the solution, all positions in a which are occupied 
by RNA tiles become undefined (locations at which no tiles exist). The resultant 
supertile may not be r-stable and thus defines a multiset of subsupertiles con- 
sisting of the maximal stable supertiles of a at temperature r, which we denote 
as BREAK T {a). 

Unless explicitly stated, in this paper we subscribe to the restriction that the 
RNAse enzyme must be added exactly once-and only after an initial phase of 
two-handed self-assembly (involving both DNA and RNA tiles at temperature r) 
reaches some (intermediate) terminal state. Of course, after the RNAse enzyme 
has completely dissolved all of the RNA tiles, self-assembly of only DNA tiles is 
allowed to proceed until a final terminal state is reached. We also assume that 
tile types cannot be added at any point of the self-assembly process, whence all 
of our constructions in this paper have 0(1) stage complexity. 

The reader is encouraged to consult [1] for a thorough discussion of the 
RNAse enzyme self-assembly model. 

7.3 Proof Sketch of Theorem 1 

Proof (sketch). Let 6 < n e N. We will describe our construction in terms of 
Figure 2. It is important to note that every (super)tile attaches in a planar 
manner. 

Verification of the North, West and South Sides of Input Shape: 

First, the red tiles form a 4-stable supertile which binds to the upper-left corner 
of the input shape. Note that two-handed assembly is required for this to happen 
since, in the shape identification problem, we cannot assume that the corners of 
the input shape are marked in any special way. Then the orange tiles count from 
2|k>gnJ+i _ n U p to 2L 1 °s"J+ 1 _ i us i n g a slightly modified version of an optimal 
binary counter used in [3]. In this construction, we encode 2 L lo s +i _ n us j n g 
[log (2 L lo s «J + 1 — n)J +1 = O(logn) unique tile types that assemble a right- 
triangle of orange tiles whose topmost row encodes the number 2 L lo s "J + 1 — n . 

The last (topmost) row of orange tiles detects the end of the count. The green 
"filler" tiles can then fill in the corner that is created by the two groups of orange 
tiles (the leftmost west-growing group of orange tiles is simply a rotated version 
of the counter described above). While the orange and green tiles are assembling, 



the yellow tiles assemble a rectangle of height n + [log n\ + 2 and width n on top 
of the input shape (this might not be possible if the input shape is not an n x n 
square). If the topmost row of orange tiles is flush with the topmost row of the 
yellow tiles, then a row of blue tiles assembles on top of the yellow tiles (from 
left to right) until the rightmost column of the yellow tiles is reached. At this 
point, the blue tiles assemble back to the left to meet up with the oppositely- 
oriented group of (north-growing) blue tiles associated with the left side of the 
input shape. If this meeting occurs, then the blue "YES" tile attaches and waits 
for the third single-tile-wide path of blue tiles associated with the south side of 
the input shape to assemble. If all three sides of the input shape agree, then a 
"YES" indigo tile initiates the assembly of a row of indigo tiles that grow back 
to the right across the top of the assembly and eventually down the right side of 
the (north-growing) yellow tiles until encountering the row of "G" yellow tiles. 

Assembly of the Border: While the self-assembly described in the previous 
paragraph is taking place, the red "lower-right" corner tiles attach to the input 
shape (of course, this might not happen if the input shape is not a square). Then, 
and only once the downward-growing indigo tiles (mentioned above) reaches 
the yellow "G" row of tiles, a group of east-growing indigo tiles counts from 
2l>g«J+ 1 — 3 — 1 up to 2L logri J +1 — 1 using the same modified optimal counting 
scheme as above. We encode 2L lo s"J +1 -3-I using [log (2L 1 °s«J+ 1 — 3 — l)J +1 = 
O(logn) unique tile types. When the indigo counter reaches its maximum value, 
a group of violet tiles assembles so that its left edge is n — 3 tiles tall and whose 
top edge is flush with that of the input shape. As the violet tiles are assembling, 
the grey "U" tiles attach to the input shape one at a time in a planar fashion. 
The main job of the "U" tiles is to determine if the right side of the input shape 
is consistent with that of an n x n square if it is, then the input shape must 
be a square. See Figure 6 for an illustration of the order of assembly in our 
construction. Note that, independent of the value of n, only the last two (the 
southernmost) tiles to be placed in this column will be of the types "U2" and 
"U3" , and there may be many more "U" tiles. These final two tiles are necessary 
to stabilize the entire column after RNAse is added and can only be placed if 
the east side, which is the final side to be checked, also matches that of an n x n 
square. 

After the "U2" and "U3" tiles attach to the right border of the input shape, 
this stage of the assembly is terminal and we add the RNAse enzyme to dissolve 
all of the RNA tiles (those that are not part of the input shape or have labels 
containing "U"). Finally, the remaining grey tiles attach to the left, top and 
bottom borders of the input shape as shown in Figure 7(b) in a clockwise fashion. 

Note that if the input shape does not match the target square, then there are 
several situations that might occur. First, the topmost orange tile will cither be 
too high (or not high enough), whence it will not be able to cooperate with the 
leftmost tile in the top row of the north-growing yellow tiles. This means that 
the rows of blue tiles will never cooperate and "agree" that three of the sides 
of the input shape are actually consistent with those of an n x n square. As a 
result, when the RNAse enzyme is added, there will be no "U" tiles attached to 




(a) All of these groups of 
tiles can assemble in paral- 
lel 




(c) The bottom border 
"agrees" with whatever dimen- 
sion the top and left borders 
agreed upon via a final blue 
path of tiles 




(b) The left and top borders 
"agree" on having the same 
dimension via the blue paths 
of tiles 




(d) After (and only if) the left, top 
and bottom borders agree on having 
the same dimension, the indigo path 
initiates the violet group of tiles that 
determine if the right side of the input 
shape is consistent with a square. 



Fig. 6: The order of assembly is Red, Orange, Yellow, Green, Blue, Indigo and 
Violet. The little black arrows represent the order of assembly of single-tile- wide 
paths. 




(a) Right after the RNAse enzyme is (b) Eventually, the grey DNA tiles 
added. assemble around the border. 



Fig. 7: This is what happens after we add the RNAse enzyme to the assembly 
from Figure 2. 



the border of the input shape. Second, if the right border of the input shape is 
not consistent with the target square, then the grey "U" tiles will search for the 
point at which the right border becomes inconsistent with a square. This will 
hinder any further "U" tiles to attach past this point. When the RNAse enzyme 
is added, the previously attached "U" tiles will not bind with sufficient strength 
and will ultimately fall off of the input shape. This situation is illustrated in 
Figure 8. 

It is worthy of note that our construction for Theorem 1 satisfies a stronger- 
perhaps more "experimentally realistic" -formulation of the shape identification 
problem in which the seed assembly is not a single input shape but instead is a 
(possibly infinite) collection of input shapes and the goal is to correctly identify 
all copies of the target shape. 

Finally, for the portions of the construction whose tile complexity was already 
mentioned, they all required O(logn) tiles. For every other component there is 
a constant sized tile set, independent of n. Therefore, the resulting construction 
require O(logrc) tile types. 

7.4 Proof Sketch of Theorem 3 

Proof (sketch). We will employ an ingenious optimal encoding scheme that was 
introduced by Adleman, Cheng, Goel, and Huang [3] and then later modified 
by Soloveichik and Winfree [26] to work at temperature t = 2. This method 

uses O fjj^rj unique tile types to encode the bit string x = XqXi • ■ -x n -i, 
whence Theorem 3 follows. Intuitively, the optimal encoding scheme expresses 
x as the concatenation of \n/k~\ strings each of length k £ N, where k is the 



(a) The "U" tiles try to attach 
along the right border. 



(b) After adding the RNAse en- 
zyme: notice that the outlined 
(bottommost) "U" tile binds 
with strength 3 < r = 4. 




(c) The "U" tiles fall off one at 
a time until these two outlined 
"U" tiles remain, which collec- 
tively bind with total strength 
2 < r = 4. 



Fig. 8: A "no" instance of the shape identification problem. Notice that there is a single 
tile missing in the rightmost column of the input shape making it a non-square input 
shape. 



smallest number satisfying < 2 k . Each length k substring is encoded as 

a unique seed tile type that collectively assemble into a seed row of tiles. An 
"unpacking" procedure is carried out in rows of tiles above this initial seed row 
of (as illustrated in Figure 9) until the bits of x are fully unpacked. The reader 
is encouraged to consult [3, 26] for a detailed analysis of this optimal encoding 
scheme. 




Fig. 9: An example of how the string x = 10100110 is unpacked. In this example, 
k = 4. Note that this image is essentially Figure 5.7 from [26] with many of the 
details omitted. 



Note that the north glues of the final (topmost) row of the unpacking process 
each advertise a corresponding bit of the input string x (padded with leading 
l's). These bits are used to seed all of the modified optimal binary counters in 
our construction. After this unpacking process completes, the construction for 
Theorem 1 can proceed normally. 

7.5 Proof Sketch of Theorem 4 

Proof (sketch). We implicitly define our universal tile set T in terms of Figure 10. 
Now let 6 < n 6 N. Our construction proceeds as follows. 

The orange unary counters (a.k.a., shifters) extract (in unary) the dimensions 
of the north, west and south sides of the input shape in parallel. The single-tile- 
wide yellow path of tiles attaches to the upper right corner of the input shape 
as well as to the right column of the north-growing unary counter. Yellow tiles 
grow to the north searching for the top of the aforementioned north-growing 
orange unary counter. When the yellow tiles find the top row, they initiate the 
self-assembly of the green tiles whose purpose is to search for the column directly 
to the right of the left most column of the north-growing orange unary counter. 
Then, the green tiles initiate the growth of a west-growing unary green counter 




* 



y % 

^T 1 



Fig. 10: Overview of universal square identification scheme. Similar to Figure 2, 
the order of assembly is: red, orange, yellow, green, blue, indigo and violet. The 
border of DNA tiles assembles exactly the same as depicted in Figure 7 except 
the "U0" tile is not used in this construction. 



that assembles a square (with dimensions equal to the length of the north side 
of the input shape) to the left of the north-growing orange unary counter. If the 
input shape is a square, then the left most tile in the bottom most row of this 
green (west-growing) unary counter will be positioned precisely one tile above 
and to the left of the left most tile in the top most row of the west-growing 
orange unary counter. This allows for a south-growing path of green tiles to 
proceed until the bottom most row of the west-growing orange unary counter 
is encountered. Then, a south-growing blue unary counter assembles a square 
(with dimensions equal to the length of the west side of the input shape) to 
the south of the west-growing orange unary counter. As before with the west- 
growing green unary counter, we force the bottom most tile in the right most 
row of this south-growing blue unary counter to be exactly one tile below and to 
the left of the left most tile in the bottom most row of the south-growing orange 
unary counter. In this corner, the growth of a single-tile-wide path of indigo tiles 
is initiated. This path of indigo tiles assembles along the outside of the entire 
assembly in a clockwise fashion until reaching the upper right green tiles. When 
the indigo tiles reach the aforementioned green tiles, an east-growing violet unary 
counter assembles to the right of the north-growing orange unary counter. When 
the final (lower right most) violet unary counter attaches, a triangle whose left 
side dimension is equal to the length of the north side of the input shape-minus 
three-assembles to the south of the previously assembled east-growing violet 
unary counter (note that the left most column of this final violet square is flush 
with the left most column of the violet unary counter to which it attaches thus 
leaving a single-tile- wide path for DNA border tiles to attach). At this point, 
DNA border tiles attach exactly as they do in the construction for Theorem 1 
with the exception that the "UO" tile is never used. 

7.6 Proof Sketch of Theorem 4 

Proof (sketch). Assume that X has a nontrivial perimeter- rectangle decomposi- 
tion with n = 4 rectangles-two on the north side and two on the south side of 
the shape-as the special cases for n < 4 are easy to handle. 

We define bin : N — > {0, 1}* as the standard binary (a.k.a., base-2) repre- 
sentation of a natural number. Let Ro, Ri, ■ ■ ■ , Ri-i be rectangles whose south 
edges touch the south edge of R(X). We can assume without loss of generality 
that for all i,j <E {0, 1, . . . , I — 1}, h(Ri) ^ h(Rj) whenever \i — j\ = 1. We 
will define strings xo,xi, ■ ■ ■ over S* = {0, 1}* that encode the heights of 

i?o, Ri, - ■ ■ , Ri-i (padded with leading 0's) respectively. First, define xo — x'^x'q 
where x'q — bin (h(Ro) — 3) and 

/ _ r 0W (flo)-l-|Wn(M*o)-3)| [ {h ( RQ ) > h ( Rl ) 

x a - <iQ W (R )+i-\bin(h(R )-3)\ otherwise. 

We assume that, for any alphabet symbol a and q < 0, the expression a q evalu- 
ates to the empty string A. For l<i<j<k<l — 1, define X j — X ' j Xj where 



x'j = bin (h(Rj) — 3) and 



io(fl,)+2-|Wn(h(fl,)-3)| if h ( R j > h ( Rj ) < hlyRk ) 
io(fl,)-2-|Wn(fc(flj)-3)| if h \ R ^ < /^.j > fc(fl fc j 

Qw(Rj)-\Un(h(Rj)-3,)\ otherwise. 
Finally, define = x' l _ 1 x"_ 1 where x"_ 1 — bin(h(Ri-i) — 3) and 



(r (fl 1 _ 1 )+2-|Wn(h(fll-i)-3)| jf h ( Rl _ 2 ) > h(IU-l) 

x i-i ~ i o^CHi-O-lbmWJii-O-a)! otherwise. 



Wejjncode the heights of the rectangles whose north edges touch the north edge 
of R(X) as the strings yo, j/i, ■ • ■ , y m -i over S* in a similar fashion. Let / be 
some computable function satisfying 

f(( x )) = 2/o2/i • ■■y m -i#bin (h (R(xfj^ #x x 1 ■ ■■x l _ 1 . 

We can assume that the leftmost and rightmost bits of each for < i < m and 
Xi for < i < I are specially marked so that the least and most significant bits 
of the binary counters which they will ultimately seed can carry unique signals 
based on whether or not they will help form a convex or concave corner with 
their neighboring rectangles (doing this will allow the border of DNA tiles to 
uniquely assemble if and only if the input shape matches the target shape). Note 
that in our construction we use a special separator character # to surround the 

bit string bin (h (^R(X)j ^ so as to distinguish it as the bit string that describes 

the height of R(X). 

Our construction is broken up into three logical phases: unpacking, frame 
assembly and border assembly. 

Unpacking Phase: In the unpacking phase, we encode a program nx that 
outputs (X) using the optimal encoding scheme of Solevichik and Winfrcc [26] 
(also described at a high-level in Section 3.3) and then use a fixed universal 

machine U to simulate 7rx~this requires O ( J^j^ | ) unique tile types. Next, on 
top of that a new Turing machine simulation uses (X) as input and computes 
f({X}), resulting in the topmost row of the assembly (i.e., the final configuration 
of the Turing machine that computes /) encoding all of the information that is 
necessary to assemble a frame for the target shape X (the blue, green and orange 
bars in Figure 11(a)). 

Frame Assembly Phase: We assemble the frame that accepts the target 
shape X as follows. First, the portion of the frame that accepts the south-facing 
features of X are assembled using a series of north-growing binary subtracters 
(similar to the optimal binary counter used in [3]). The final row in each of the 
subtractors assembles from right to left and attaches appropriate corner tiles 
that advertise whether or not they form a concave or convex corner with any 
neighboring subtractors (note that this information was computed by / and is 
encoded in the strings xq,Xi, . . .xi-\). 




(a) The program nx 
(the yellow bar) is 
decompressed into (X) 
and f({X)) is sub- 
sequently computed. 
The blue, green and 
orange bars collectively 
represent the string 
/«*»• 




(b) The portion of the 
frame that describes the 
south facing features of 
the target shape are as- 
sembled; the green rect- 
angle is a binary sub- 
tractor that assembles 
h (r(X)) rows. 




(d) Finally, the right 
border of the frame is as- 
sembled and the assem- 
bly can correctly iden- 
tify the target shape. 



r 


:• 













(c) The information 
that describes the 
north-facing features 
of the target shape are 
correctly positioned and 
then assembled. 



Fig. 11: Assembly sequence for the unpacking and frame assembly phases. 



Fig. 12: A slightly more detailed version of a portion of Figure 11(d). The target 
shape X is the grey structure with single strength glues along its border. The 
light orange rectangles represent binary subtracters (seeded by corresponding 
portions of dark orange tiles along the bottom row) that assemble portions of 
the input frame that accept the south-facing features of the target shape. Notice 
that we mark the left and right corners of the topmost row of each subtractor 
with tiles that indicate whether a particular corner is concave or convex. We also 
mark the top left and bottom right most corners with special pink tiles (these 
tiles will be used in the border assembly phase). South-growing counters that 
assemble the portion of the frame that accepts the north-facing features of X 
are constructed similarly. 



Then, the left border of the frame-simply a wall of tiles whose height is equal 
to that of R(X)-is constructed via a binary subtractor that assembles h (^R(X)^j 

many rows. The glues on the east side of every tile in the rightmost column of 
this subtractor uniformly advertise (to the border tiles which will assemble later) 
that they are part of the frame. 

Next, the portion of the frame that accepts the north- facing features of X is 
constructed using south-growing binary subtractors similar to how the portion 
of the frame that accepts the south-facing features was constructed. The infor- 
mation describing these features is first copied up along the left side of the frame 
and then rotated 180 degrees clockwise so that they are correctly oriented and 
positioned directly above their north-growing counterparts. 

Finally, the right side of the frame is assembled using a single-tile- wide south- 
growing path of generic frame border tiles that eventually "bump into" the 
portion of the assembly in which the unpacking process was carried out. All of 
the glues on the west side of these tiles uniformly advertise (to the border tiles 
which will assemble later) that they are part of the frame. Figure 12 depicts a 
fully-assembled frame enclosing a particular target shape. 

Border Assembly Phase: In the border assembly phase, we must accom- 
plish the task of assembling a complete and fully connected border of DNA tiles 
if and only if the input shape matches the target shape. Since the system tem- 
perature is r = 4, we can force DNA border tiles to cooperate with (1) the most 
recently attached border tile (with strength 2), (2) the frame (with strength 1) 
and (3) a single glue on the input shape (with strength 1). Essentially, if a DNA 
border tile cannot cooperate with an already-attached border tile, a frame tile 
and an input shape tile simultaneously, then it will not have sufficient strength 
to bind-except in the special case when a border tile must attach in a concave 
corner. The border assembly phase of an instance of the shape identification 
problem in which the input shape matches the target shape is depicted in Fig- 
ure 13. 

Following the assembly of the border tiles, the assembly will then be terminal 
and the RNAse enzyme is added to dissolve away all of the RNA tiles. If the 
input shape does not match the target shape X then we must ensure that the 
border of the input shape will eventually be free of any DNA tiles. By the way we 
enforce that DNA border tiles can attach, we can ensure that, unless the border 
completely assembles, there is always at least one red or purple DNA border 
tile that binds to other DNA border tiles and the input shape with strength 
3 < t — 4, which means that it will detach from the input shape once the RNA 
tiles are dissolved. Once this first tile dissociates, then there will exist another 
DNA border tile that binds with only strength 3. This process of disassembly 
of the border one tile at a time (except for purple tiles which can dissociate in 
groups) will continue until all DNA border tiles are no longer attached to the 
input shape. This situation is depicted in Figure 14. 



The number of unique tile types required for the unpacking phase is O 




while the number of tile types required for the remaining phases of the construe- 




(a) The input shape is attached to the frame 
(with strength r = 4) via the two purple cor- 
ner gadgets. The assembly of the DNA border 
tiles can now proceed in a counter-clockwise 
fashion starting from the "L3" and "J3" tiles. 
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(b) The red DNA border tiles assemble se- 
quentially in a counter-clockwise fashion. No- 
tice that each DNA border cooperates with 
one bond each from the input shape, a pre- 
viously attached DNA border tile and the 
frame (except for corner tiles). This ensures 
that when the frame is dissolved by the 
RNAse enzyme, the border will be complete 
and fully connected. 

Fig. 13: The border assembly phase. The purple and red tiles are DNA tiles. 



(a) In this case, only a portion of the leftmost 
path of red DNA border tiles can assemble, 
which means that at least one DNA border 
tile binds to other border tiles and the input 
shape with strength 3 < r = 4. 



(b) After the RNAse enzyme is 
added and the frame dissolves, 
the partial border begins to dis- 
assemble one tile at a time (ex- 
cept for purple tiles which can 
dissociate in groups. 




(c) The order in which all of the DNA 
border tiles in (b) eventually dissoci- 
ate. 



Fig. 14: The border assembly phase in which the input shape is not the same as 
the target shape. 



tion is a constant. Hence, the identification complexity of X with respect to C 
is O ( J^-t) ■ 

Note that the restriction that X be x-monotone is not necessary in the sense 
that the east and west sides of the input shape can have features similar to those 
of the north and south sides if the construction is extended in the obvious way. 
However, to make the class of identifiable shapes more straightforward to define 
and the construction easier to explain, we have presented the construction with 
that constraint. 

7.7 Discussion of Theorem 5 




Fig. 15: A compact description of all of the rectangle supertiles that make up 
the frame is unpacked by a Turing machine. Then a constant size tile set is 
used to convert each description into a rectangle supertile with the appropriate 
connection interface. 



Prior to the first dissolve stage, we assemble pieces which will themselves 
eventually combine in a two-handed fashion to form the frame that accepts X 
using a collection of rectangular supertiles of arbitrary dimension (we use a Tur- 
ing machine to compute the dimensions of these rectangles using an algorithmic 
description of X). See Figure 4(a) for an intuitive depiction of this process. After 
the the first type of RNAse enzyme is added, all of the rectangles assemble in 
a two-handed fashion and connect via "binary teeth" interfaces along their left 
and right sides (see [1,9,10] for examples of this well-known two-handed assem- 
bly technique). Note that these connection interfaces can be made arbitrarily 



Fig. 16: A closer look at the outlined portion of Figure 15. Here we use a tile 
set to convert the algorithmic description of this particular rectangle supertile 
into an assembly of RNA tiles (of the type dissolved only by the second RNAse 
enzyme to be added) . The arrows represent a standard self-assembly "rotation" 
scheme in which a horizontal sequence of bits (one bit per tile) is converted into 
a vertical sequence of bits. 




Fig. 17: The rectangle supertile that is left after the first type of RNAse enzyme 
is added. Note that other rectangle supertiles attach via the "!" and "?" tiles 
with total strength 4. 



long depending on how "jagged" the north- and south-facing features of Y are. 
Once the frame assembles (see Figure 4(b)), a fully-connected border of DNA 
tiles will attach to Y if and only if Y = X in exactly the same fashion as it 
did in the construction for Theorem 4. After all of the border tiles attach, the 
second type of RNAse enzyme is added, dissolving the frame and completing the 
construction. 

It is worthy of note that, in the construction discussed 
above, after the first type of RNAse enzyme is added, 
the frame that will eventually accept X might be very 
jIU U U U U U Lj large depending on how "non-uniformly jagged" the fea- 
tures of Y are. This is because the more jagged Yis, the 
longer the rectangular supertiles that assemble into the 
frame that accepts X must be in order to accommodate 
a larger number of unique connection interfaces. How- 
ever, if many of the features of X are "similar," then 
connection interface patterns can be re-used thus elimi- 
nating the need for larger rectangular supertiles. An ex- 
ample of a kind of "worst-case" target shape X might be 
a comb-like structure with a very long "base" and with 
each "tooth" slightly taller than the tooth to its left (see 

Figure 18). 

Similar to the previous construction, the number of unique tile types required 



Fig. 18: A "worst- 
case" example tar- 
get shape X. Note 
that we cannot re- 
use any rectangular 
supertiles that make 
up the frame that ac- 
cepts X. 



(- 



while the number of tile types required for 



for the unpacking phase is O y \ og \ 7rx 
the remaining phases of the construction is a constant. Hence, the identification 
complexity of X with respect to C is O ( j^jf^j^- Also similar to the previous 
construction, the restriction that X be x-monotone isn't entirely necessary, and 
with the obvious modifications to the construction, many shapes with features 
on their cast and west sides can also be identified. 



